pacman::p_load(sf,tmap,spdep,tidyverse, dplyr, mapview, sfdep, stplanr, glue, DT)Take Home Exercise 1
Overview
The recent shift in payment being made more digital, companies and organisations can more easily collect data and information that are linked to consumer habits. The transportation industry including public transport such as buses has also lean into this phenomenon. The information collected include travelling patterns that can help companies plan for more efficient routes or where heavy ridership is to be expected.
Objectives
Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.
Task
Here we will utilise bus travelling data at different time duration for plotting out geospatial data and analysing them using various statistical tools.
Geovisualisation and Analysis
Computing passenger trips at the hexagonal level for the following time intervals:
- Weekday morning peak, 6am to 9am
- Weekday afternoon peak, 5pm to 8pm
- Weekend/holiday morning peak, 11am to 2pm
- Weekend/holiday evening peak, 4pm to 7pm
Display the geographical distribution using choropleth maps of the hexagons.
Combine all of the passenger trips made by all of the bus stops within a hexagon together
Local Indicators of Spatial Association(LISA) Analysis
Compute the number of ridership from bus stops that belong to a single hexagon and display the LISA maps.
Load Packages and Data
Load packages
Here we will load the packages needed for this exercise and their respective functions - sf: - tmap: - spdep: - tidyverse: - dplyr: - mapview: - sfdep:
Loading data
Loading aspatial table
Here we will read all of the ridership from different bus stops in Oct 2023 and assign it to the variable.
odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")We will then extract the information from the following and assign them to different variables.
| Day | Duration | Variable name |
|---|---|---|
| Weekdays | 6am - 9am | weekdayAM_6_9 |
| Weekdays | 5pm - 8pm | weekdayPM_5_8 |
| Weekends/Holidays | 11am - 2pm | weekendAM_11_14 |
| Weekends/Holidays | 4pm - 7pm | weekendPM_4_7 |
weekdayAM_6_9 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
weekdayPM_5_8 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
weekendAM_11_14 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
weekendPM_16_19 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 19) %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))Loading Geospatial data
Next we will import all of the bus stops and their coordinates and attached it to the busstop variable.
busstop <- st_read(dsn = "data/geospatial", layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`C:\Users\Lian Khye\Desktop\MITB\Geospatial\geartooth\ISSS624\Take_home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
We will first rename the bus stop column title for easier data joining.
colnames(busstop)[colnames(busstop) == "BUS_STOP_N"] <- "ORIGIN_PT_CODE"We will also import the layout of Singapore for excluding busstops that are not found in Singapore.
mpsz <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`C:\Users\Lian Khye\Desktop\MITB\Geospatial\geartooth\ISSS624\Take_home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
After that we will create the hexagons that will create the map layout. The hexagons will be shaped 250 x 250 cell size. All of the hexagons will also be given a grid id name that can be used for identifying each individual grid.
center <- st_centroid(busstop)
area_honeycomb_grid <- st_make_grid(center, cellsize = c(250 * sqrt(3), 250 * 2), what = "polygons", square = FALSE)
honeycomb_grid_sf <- st_sf(area_honeycomb_grid) %>%
mutate(grid_id = 1:length(lengths(area_honeycomb_grid)))Data processing
Assigning individual bus stop to hexagons
First we will assign the bus stop point geometry data to each polygon using st_intersection(). The function assigns all of the points to a polygon by the point-set intersection of two geometries. Additional information here.
valid_busstop <- st_intersection(busstop, mpsz)
busstop_hex <- st_intersection(valid_busstop, honeycomb_grid_sf) %>%
st_drop_geometry()Duplication check
Here we will check for the presence of any duplication before we further process the data.
duplicate1 <- weekdayAM_6_9 %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate2 <- weekdayPM_5_8 %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate3 <- weekendAM_11_14 %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate4 <- weekendPM_16_19 %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()We can see which data points are duplicated here.
c(duplicate1,duplicate2,duplicate3,duplicate4)$ORIGIN_PT_CODE
character(0)
$TRIPS
numeric(0)
$ORIGIN_PT_CODE
character(0)
$TRIPS
numeric(0)
$ORIGIN_PT_CODE
character(0)
$TRIPS
numeric(0)
$ORIGIN_PT_CODE
character(0)
$TRIPS
numeric(0)
Finally we only keep data points that are unique using the unique() function.
unique_weekdayAM <- unique(weekdayAM_6_9)
unique_weekdayPM <- unique(weekdayPM_5_8)
unique_weekendAM <- unique(weekendAM_11_14)
unique_weekendPM <- unique(weekendPM_16_19)Trip tabulation
Next we will then join the variables that we created earlier that contains the total number of trips at different time intervals and the busstop_hex variable together using grid_id column title that they have in common. The total number of trips made from each hexagon is then summed up together and placed under a new column named TOT_TRIPS.
count_weekdayAM_6_9 <- left_join(unique_weekdayAM , busstop_hex) %>%
group_by(grid_id) %>%
summarise(TOT_TRIPS = sum(TRIPS))
count_weekdayPM_5_8 <- left_join(unique_weekdayPM , busstop_hex) %>%
group_by(grid_id) %>%
summarise(TOT_TRIPS = sum(TRIPS))
count_weekendAM_11_14 <- left_join(unique_weekendAM , busstop_hex) %>%
group_by(grid_id) %>%
summarise(TOT_TRIPS = sum(TRIPS))
count_weekendPM_16_19 <- left_join(unique_weekendPM , busstop_hex) %>%
group_by(grid_id) %>%
summarise(TOT_TRIPS = sum(TRIPS))Reassign polygon information
We will the reassign the polygon information from the hexagonal map that we have created earlier.
poly_weekdayAM_6_9 <- left_join(honeycomb_grid_sf,count_weekdayAM_6_9)
poly_weekdayPM_5_8 <- left_join(honeycomb_grid_sf,count_weekdayPM_5_8)
poly_weekendAM_11_14 <- left_join(honeycomb_grid_sf,count_weekendAM_11_14)
poly_weekendPM_16_19 <- left_join(honeycomb_grid_sf,count_weekendPM_16_19)Filter for empty trips
Following that we will filter hexagons that have no trips to obtain only valid hexagons for mapping.
grid_weekdayAM <- poly_weekdayAM_6_9 %>%
filter(TOT_TRIPS > 0)
grid_weekdayPM <- poly_weekdayPM_5_8 %>%
filter(TOT_TRIPS > 0)
grid_weekendAM <- poly_weekendAM_11_14 %>%
filter(TOT_TRIPS > 0)
grid_weekendPM <- poly_weekendPM_16_19 %>%
filter(TOT_TRIPS > 0)Choropleth map
Here we will plot the choropleth map for the different time intervals. We will be using tmap_mode(“plot”) to create an interactive map. Although we will be coding in accessories such as the compass, they will not be displayed in the interactive map. However by writing them first, we can display them in subsequent maps once we view them in plot mode.
tmap_mode("view")
mapA <- tm_shape(grid_weekdayAM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Blues",
title = "Passenger trips") +
tm_layout(main.title = "Passenger trips generated Weekday 6am-9am",
main.title.position = "center",
main.title.size = 0.7,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 1) +
tm_scale_bar() +
tm_grid(alpha =0.2) +
tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA",
position = c("left", "bottom"))
mapAThe total number of ridership range from 1 to 357043 per hexagon. The range of the data are divided to quantile range bands for clearer distinction between ridership of each hexagon.
mapB <- tm_shape(grid_weekdayPM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Reds",
title = "Passenger trips") +
tm_layout(main.title = "Passenger trips generated Weekday 5pm-8pm",
main.title.position = "center",
main.title.size = 0.7,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 1) +
tm_scale_bar() +
tm_grid(alpha =0.2) +
tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA",
position = c("left", "bottom"))
mapBThe total number of ridership range from 1 to 568845 per hexagon.
mapC <- tm_shape(grid_weekendAM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Greens",
title = "Passenger trips") +
tm_layout(main.title = "Passenger trips generated Weekend/holidays 11am-2pm",
main.title.position = "center",
main.title.size = 0.7,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 1) +
tm_scale_bar() +
tm_grid(alpha =0.2) +
tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA",
position = c("left", "bottom"))
mapCThe total number of ridership range from 1 to 117609 per hexagon.
mapD <- tm_shape(grid_weekendPM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Purples",
title = "Passenger trips") +
tm_layout(main.title = "Passenger trips generated Weekend/holidays 4pm-7pm",
main.title.position = "center",
main.title.size = 0.7,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 1) +
tm_scale_bar() +
tm_grid(alpha =0.2) +
tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA",
position = c("left", "bottom"))
mapDThe total number of ridership range from 1 to 114410 per hexagon.
Plot maps
We can also utilise a plot mode by using tmap_mode(“plot”).
tmap_mode("plot")
mapA
This allows for other accessories to be added such as compass and scales which might be useful depending on the application or use of the map.
Choropleth map discussion
We can see that the total ridership over the weekends is lower than the weekday counterparts despite both being classified as peak hours. This difference is likely due to people travelling to or from work on the weekdays as compared to the weekends where there will be less traffic as people choose to stay at home or electing to travel to districts where it may be more accessible by other transportations such as trains.
LISA
Determining the K-nearest neighbour
As some of the data points are isolated or are far away from each other, it is better to use the adaptive distance weighting method to determine the number of neighbours. Here we will use 6 neighbours as a repesentative of 6 sides to a hexagon.
adap_weekdayAM <- grid_weekdayAM %>%
mutate(nb = st_knn(area_honeycomb_grid,
k=6),
wt = st_weights(nb),
.before = 1)
adap_weekdayPM <- grid_weekdayPM %>%
mutate(nb = st_knn(area_honeycomb_grid,
k=6),
wt = st_weights(nb),
.before = 1)
adap_weekendAM <- grid_weekendAM %>%
mutate(nb = st_knn(area_honeycomb_grid,
k=6),
wt = st_weights(nb),
.before = 1)
adap_weekendPM <- grid_weekendPM %>%
mutate(nb = st_knn(area_honeycomb_grid,
k=6),
wt = st_weights(nb),
.before = 1)Visualise Adaptive distance map
Here we will visualise the choropleth map of the nearest neighbour for the different time intervals. We will plot it using view mode with the regular choropleth map of the left and the choropleth map with the adaptive distance weighted map on the right.
We can mouse over each of the grid in the adaptive distance weighted map to see who the neighbours are, however the neighbours seen is from the index number of the weighted neighbour variable we generated above and not the grid number.
tmap_mode("view")
adapA <- tm_shape(adap_weekdayAM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Blues",
title = "Passenger trips")
tmap_arrange(mapA, adapA, asp=1, ncol=2)adapB <- tm_shape(adap_weekdayPM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Reds",
title = "Passenger trips")
tmap_arrange(mapB, adapB, asp=1, ncol=2)adapC <- tm_shape(adap_weekendAM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Greens",
title = "Passenger trips")
tmap_arrange(mapC, adapC, asp=1, ncol=2)adapD <- tm_shape(adap_weekendPM)+
tm_fill("TOT_TRIPS",
style = "quantile",
palette = "Purples",
title = "Passenger trips")
tmap_arrange(mapD, adapD, asp=1, ncol=2)Local Moran’s I
Next we will calculate the local Moran’s I of the total number of trips of each hexagon using the local_moran() function of the sfdep package.
lisa_weekdayAM <- adap_weekdayAM %>%
mutate(local_moran = local_moran(
TOT_TRIPS, nb, wt,nsim = 99),
.before = 1) %>%
unnest(local_moran)
lisa_weekdayPM <- adap_weekdayPM %>%
mutate(local_moran = local_moran(
TOT_TRIPS, nb, wt,nsim = 99),
.before = 1) %>%
unnest(local_moran)
lisa_weekendAM <- adap_weekendAM %>%
mutate(local_moran = local_moran(
TOT_TRIPS, nb, wt,nsim = 99),
.before = 1) %>%
unnest(local_moran)
lisa_weekendPM <- adap_weekendPM %>%
mutate(local_moran = local_moran(
TOT_TRIPS, nb, wt,nsim = 99),
.before = 1) %>%
unnest(local_moran)The output will be a data fram containing the ii, eii, var_ii, z_ii, p_ii, p_ii_sim and p_folded_sum.
Visualisation of local Moran’s I and p-value
Next we will plot the choropleth maps using the **ii* and p_ii_sim field.
tmap_mode("plot")
map1_weekdayAM <- tm_shape(lisa_weekdayAM) +
tm_fill("ii") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "local Moran's I of TOT_TRIPS(Weekday 6am to 9am)",
main.title.size = 0.8)
map2_weekdayAM <- tm_shape(lisa_weekdayAM) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "p-value of local Moran's I of TOT_TRIPS(Weekday 6am to 9am)",
main.title.size = 0.8)
tmap_arrange(map1_weekdayAM, map2_weekdayAM, ncol = 2)
tmap_mode("plot")
map1_weekdayPM <- tm_shape(lisa_weekdayPM) +
tm_fill("ii") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "local Moran's I of TOT_TRIPS(Weekday 5pm to 8pm)",
main.title.size = 0.8)
map2_weekdayPM <- tm_shape(lisa_weekdayPM) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "p-value of local Moran's I of TOT_TRIPS(Weekday 5pm to 8pm)",
main.title.size = 0.8)
tmap_arrange(map1_weekdayPM, map2_weekdayPM, ncol = 2)
tmap_mode("plot")
map1_weekendAM <- tm_shape(lisa_weekendAM) +
tm_fill("ii") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "local Moran's I of TOT_TRIPS(Weekend/Holidays 11am to 2pm)",
main.title.size = 0.8)
map2_weekendAM <- tm_shape(lisa_weekendAM) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "p-value of local Moran's I of TOT_TRIPS(Weekend/Holidays 11am to 2pm)",
main.title.size = 0.8)
tmap_arrange(map1_weekendAM, map2_weekendAM, ncol = 2)
tmap_mode("plot")
map1_weekendPM <- tm_shape(lisa_weekendPM) +
tm_fill("ii") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "local Moran's I of TOT_TRIPS(Weekend/Holidays 4pm to 7pm)",
main.title.size = 0.8)
map2_weekendPM <- tm_shape(lisa_weekendPM) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "p-value of local Moran's I of TOT_TRIPS(Weekend/Holidays 4pm to 7pm)",
main.title.size = 0.8)
tmap_arrange(map1_weekendPM, map2_weekendPM, ncol = 2)
Visualising LISA map
Finally we will create the LISA map. It is created using the local Moran’s I and their p-values. We will be using the mean for plotting.
The data that will be created will have 4 categories made of 2 outliers and 2 clusters.
Outliers:
- High-Low: High total trips with low neighbours
- Low-High: Low total trips with high neighbours
Clusters:
- High-High: High total trips with high neighbours
- Low-Low: Low total trips with low neighbours
More information can be found here.
tmap_mode("view")
lisa_sig_weekdayAM <- lisa_weekdayAM %>%
filter(p_ii_sim < 0.05)
tmap_mode("plot")
tm_shape(lisa_weekdayAM) +
tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_weekdayAM) +
tm_fill("mean") +
tm_borders(alpha = 0.4)
highest_grid_weekdayAM <- lisa_weekdayAM %>%
group_by(mean) %>%
slice_max(TOT_TRIPS) %>%
select(10, 15,16)
lowest_grid_weekdayAM <- lisa_weekdayAM %>%
group_by(mean) %>%
slice_min(TOT_TRIPS) %>%
select(10, 15,16)
HL_weekdayAM <- busstop_hex %>%
left_join(highest_grid_weekdayAM, by = "grid_id") %>%
filter(grepl("High-Low", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(HL = paste(LOC_DESC, collapse = ", "))
LH_weekdayAM <- busstop_hex %>%
left_join(lowest_grid_weekdayAM, by = "grid_id") %>%
filter(grepl("Low-High", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(LH = paste(LOC_DESC, collapse = ", "))
HH_weekdayAM <- busstop_hex %>%
left_join(highest_grid_weekdayAM, by = "grid_id") %>%
filter(grepl("High-High", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(Hh = paste(LOC_DESC, collapse = ", "))
LL_weekdayAM <- busstop_hex %>%
left_join(lowest_grid_weekdayAM, by = "grid_id") %>%
filter(grepl("Low-Low", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(LL = paste(LOC_DESC, collapse = ", "))
glue("The highest volume of riders originating from bus stops with high-low outliers are:\n {HL_weekdayAM}.\n\nThe lowest volume of riders originating from bus stops with low-high outliers are:\n{LH_weekdayAM}\n\nThe highest volume of riders originating from bus stops with high-high clusters are:\n {HH_weekdayAM}.\n\nThe lowest volume of riders originating from bus stops with low-low clusters are:\n{LL_weekdayAM}")The highest volume of riders originating from bus stops with high-low outliers are:
OPP W'LANDS CIVIC CTR, WOODLANDS REG INT, WOODLANDS REG INT, W'LANDS CIVIC CTR.
The lowest volume of riders originating from bus stops with low-high outliers are:
OPP JEM
The highest volume of riders originating from bus stops with high-high clusters are:
BOON LAY INT.
The lowest volume of riders originating from bus stops with low-low clusters are:
3NC S'PORE, AFT LP 161, OPP LP 160, OPP SCDF NEE SOON CAMP, BEF TPE, AFT OLD TAMPINES RD
tmap_mode("view")
lisa_sig_weekdayPM <- lisa_weekdayPM %>%
filter(p_ii_sim < 0.05)
tmap_mode("plot")
tm_shape(lisa_weekdayPM) +
tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_weekdayPM) +
tm_fill("mean") +
tm_borders(alpha = 0.4)
highest_grid_weekdayPM <- lisa_weekdayPM %>%
group_by(mean) %>%
slice_max(TOT_TRIPS) %>%
select(10, 15,16)
lowest_grid_weekdayPM <- lisa_weekdayPM %>%
group_by(mean) %>%
slice_min(TOT_TRIPS) %>%
select(10, 15,16)
HL_weekdayPM <- busstop_hex %>%
left_join(highest_grid_weekdayPM, by = "grid_id") %>%
filter(grepl("High-Low", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(HL = paste(LOC_DESC, collapse = ", "))
LH_weekdayPM <- busstop_hex %>%
left_join(lowest_grid_weekdayPM, by = "grid_id") %>%
filter(grepl("Low-High", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(LH = paste(LOC_DESC, collapse = ", "))
HH_weekdayPM <- busstop_hex %>%
left_join(highest_grid_weekdayPM, by = "grid_id") %>%
filter(grepl("High-High", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(Hh = paste(LOC_DESC, collapse = ", "))
LL_weekdayPM <- busstop_hex %>%
left_join(lowest_grid_weekdayPM, by = "grid_id") %>%
filter(grepl("Low-Low", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(LL = paste(LOC_DESC, collapse = ", "))
glue("The highest volume of riders originating from bus stops with high-low outliers are:\n {HL_weekdayPM}.\n\nThe lowest volume of riders originating from bus stops with low-high outliers are:\n{LH_weekdayPM} \n\nThe highest volume of riders originating from bus stops with high-high clusters are:\n {HH_weekdayPM}.\n\nThe lowest volume of riders originating from bus stops with low-low clusters are:\n{LL_weekdayPM}")The highest volume of riders originating from bus stops with high-low outliers are:
OPP W'LANDS CIVIC CTR, WOODLANDS REG INT, WOODLANDS REG INT, W'LANDS CIVIC CTR.
The lowest volume of riders originating from bus stops with low-high outliers are:
OPP JEM
The highest volume of riders originating from bus stops with high-high clusters are:
BOON LAY INT.
The lowest volume of riders originating from bus stops with low-low clusters are:
OPP LP 173, B26 LIM CHU KANG RD, AFT LP 161, OPP LP 160
tmap_mode("view")
lisa_sig_weekendAM <- lisa_weekendAM %>%
filter(p_ii_sim < 0.05)
tmap_mode("plot")
tm_shape(lisa_weekendAM) +
tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_weekendAM) +
tm_fill("mean") +
tm_borders(alpha = 0.4)
highest_grid_weekendAM <- lisa_weekendAM %>%
group_by(mean) %>%
slice_max(TOT_TRIPS) %>%
select(10, 15,16)
lowest_grid_weekendAM <- lisa_weekendAM %>%
group_by(mean) %>%
slice_min(TOT_TRIPS) %>%
select(10, 15,16)
HL_weekendAM <- busstop_hex %>%
left_join(highest_grid_weekendAM, by = "grid_id") %>%
filter(grepl("High-Low", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(HL = paste(LOC_DESC, collapse = ", "))
LH_weekendAM <- busstop_hex %>%
left_join(lowest_grid_weekendAM, by = "grid_id") %>%
filter(grepl("Low-High", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(LH = paste(LOC_DESC, collapse = ", "))
HH_weekendAM <- busstop_hex %>%
left_join(highest_grid_weekendAM, by = "grid_id") %>%
filter(grepl("High-High", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(Hh = paste(LOC_DESC, collapse = ", "))
LL_weekendAM <- busstop_hex %>%
left_join(lowest_grid_weekendAM, by = "grid_id") %>%
filter(grepl("Low-Low", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(LL = paste(LOC_DESC, collapse = ", "))
glue("The highest volume of riders originating from bus stops with high-low outliers are:\n {HL_weekendAM}.\n\nThe lowest volume of riders originating from bus stops with low-high outliers are:\n{LH_weekendAM}\n\nThe highest volume of riders originating from bus stops with high-high clusters are:\n {HH_weekendAM}.\n\nThe lowest volume of riders originating from bus stops with low-low clusters are:\n{LL_weekendAM}")The highest volume of riders originating from bus stops with high-low outliers are:
OPP W'LANDS CIVIC CTR, WOODLANDS REG INT, WOODLANDS REG INT, W'LANDS CIVIC CTR.
The lowest volume of riders originating from bus stops with low-high outliers are:
OPP JEM
The highest volume of riders originating from bus stops with high-high clusters are:
BOON LAY INT.
The lowest volume of riders originating from bus stops with low-low clusters are:
STARDUST ENGRG, AFT MARINA GDNS DR
tmap_mode("view")
lisa_sig_weekendPM <- lisa_weekendPM %>%
filter(p_ii_sim < 0.05)
tmap_mode("plot")
tm_shape(lisa_weekendPM) +
tm_polygons() +
tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_weekendPM) +
tm_fill("mean") +
tm_borders(alpha = 0.4)
highest_grid_weekendPM <- lisa_weekendPM %>%
group_by(mean) %>%
slice_max(TOT_TRIPS) %>%
select(10, 15,16)
lowest_grid_weekendPM <- lisa_weekendPM %>%
group_by(mean) %>%
slice_min(TOT_TRIPS) %>%
select(10, 15,16)
HL_weekendPM <- busstop_hex %>%
left_join(highest_grid_weekendPM, by = "grid_id") %>%
filter(grepl("High-Low", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(HL = paste(LOC_DESC, collapse = ", "))
LH_weekendPM <- busstop_hex %>%
left_join(lowest_grid_weekendPM, by = "grid_id") %>%
filter(grepl("Low-High", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(LH = paste(LOC_DESC, collapse = ", "))
HH_weekendPM <- busstop_hex %>%
left_join(highest_grid_weekendPM, by = "grid_id") %>%
filter(grepl("High-High", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(Hh = paste(LOC_DESC, collapse = ", "))
LL_weekendPM <- busstop_hex %>%
left_join(lowest_grid_weekendPM, by = "grid_id") %>%
filter(grepl("Low-Low", mean, ignore.case = TRUE)) %>%
select(3)%>%
summarise(LL = paste(LOC_DESC, collapse = ", "))
glue("The highest volume of riders originating from bus stops with high-low outliers are:\n {HL_weekendPM}.\n\nThe lowest volume of riders originating from bus stops with low-high outliers are:\n{LH_weekendPM}\n\nThe highest volume of riders originating from bus stops with high-high clusters are:\n {HH_weekendPM}.\n\nThe lowest volume of riders originating from bus stops with low-low clusters are:\n{LL_weekendPM}")The highest volume of riders originating from bus stops with high-low outliers are:
OPP W'LANDS CIVIC CTR, WOODLANDS REG INT, WOODLANDS REG INT, W'LANDS CIVIC CTR.
The lowest volume of riders originating from bus stops with low-high outliers are:
LIANG HUAT IND CPLX
The highest volume of riders originating from bus stops with high-high clusters are:
BOON LAY INT.
The lowest volume of riders originating from bus stops with low-low clusters are:
BEF BENOI SECTOR, AFT BENOI SECTOR, OPP LP 173, B26 LIM CHU KANG RD, AFT TRACK 11, OPP TRACK 11, AFT JLN BAHTERA
Summarise outliers and clusters
We will then compile all of the outliers and clusters from the different time intervals together for easier comparison. We will be using the highest ridership volume for high-low and high-high grids and lowest ridership volume for low-high and low-low grids.
compiled_weekdayAM <-data.frame(HL_weekdayAM, LH_weekdayAM, HH_weekdayAM, LL_weekdayAM)
compiled_weekdayPM <-data.frame(HL_weekdayPM, LH_weekdayPM, HH_weekdayPM, LL_weekdayPM)
compiled_weekendAM <-data.frame(HL_weekendAM, LH_weekendAM, HH_weekendAM, LL_weekendAM)
compiled_weekendPM <-data.frame(HL_weekendPM, LH_weekendPM, HH_weekendPM, LL_weekendPM)
combined <- rbind(compiled_weekdayAM, compiled_weekdayPM, compiled_weekendAM, compiled_weekendPM) %>%
setNames(c("High-Low", "Low-High", "High-High", "Low-Low")) %>%
rownames_to_column("Time Period") %>%
mutate(`Time Period` = c("Weekdays 6am to 9am", "Weekdays 5pm to 8pm", "Weekends/Holidays 11am to 2pm", "Weekends/Holidays 5pm to 8pm")) %>%
column_to_rownames(var = "Time Period")
datatable(combined)We can see that the highest commuter volume can be seen at interchanges which makes sense as those bus stops cater to the most number of buses as compared to regular bus stops. Additionally they serve high passenger volume at across all intervals, likely due to the number of residential districts found near those interchanges.
There are 2 entries for Woodlands interchange as there are 2 bus stops with the same name found within the same hexagon.
The grids that are found in relatively isolated areas such as areas without residential or working facilities also reflects in the low ridership volume we see above. The bus stop, OPP JEM, that is situated next to Jurong East interchange likely experience low ridership due to it being situated next to a major bus interchange. With a high passenger volume bus stop situated next to it likely contributed it to be consistently categorised as a low-high entry.
Additional processing
Passenger flow visualisation
Origin and destination grouping
We can also take a look at passenger glow between the different hexes by plotting the desire lines. We will first need to obtain the total number of trips of each intervals that are grouped by their origin followed by their destination.
des_weekdayAM_6_9 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
des_weekdayPM_5_8 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
des_weekendAM_11_14 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
des_weekendPM_16_19 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 19) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))Joining destination table and grid
Since we have drop
Using the grid that we have created in the beginning, we will now join the data that contains the destination together with the grid.
combn_des_weekdayAM_6_9 <- left_join(des_weekdayAM_6_9,busstop_hex) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_GRID = grid_id,
DESTIN_BS = DESTINATION_PT_CODE)
combn_des_weekdayPM_5_8 <- left_join(des_weekdayPM_5_8,busstop_hex) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_GRID = grid_id,
DESTIN_BS = DESTINATION_PT_CODE)
combn_des_weekendAM_11_14 <- left_join(des_weekendAM_11_14,busstop_hex) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_GRID = grid_id,
DESTIN_BS = DESTINATION_PT_CODE)
combn_des_weekendPM_16_19 <- left_join(des_weekendPM_16_19,busstop_hex) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_GRID = grid_id,
DESTIN_BS = DESTINATION_PT_CODE)Removing duplicates
We will then remove any duplicates using the unique() function.
uni_des_weekdayAM_6_9 <- unique(combn_des_weekdayAM_6_9)
uni_des_weekdayPM_5_8 <- unique(combn_des_weekdayPM_5_8)
uni_des_weekendAM_11_14 <- unique(combn_des_weekendAM_11_14)
uni_des_weekendPM_16_19 <- unique(combn_des_weekendPM_16_19)Rejoin hexagonal information
We can then rejoin the hexagonal information back using left_join() function.
poly_des_weekdayAM_6_9 <- left_join(uni_des_weekdayAM_6_9 , busstop_hex,
by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))
poly_des_weekdayPM_5_8 <- left_join(uni_des_weekdayPM_5_8 , busstop_hex,
by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))
poly_des_weekendAM_11_14 <- left_join(uni_des_weekendAM_11_14 , busstop_hex,
by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))
poly_des_weekendPM_16_19 <- left_join(uni_des_weekendPM_16_19 , busstop_hex,
by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))Recheck unique data
We will then recheck for unique data points by running unique() again.
uniP_des_weekdayAM_6_9 <- unique(poly_des_weekdayAM_6_9)
uniP_des_weekdayPM_5_8 <- unique(poly_des_weekdayPM_5_8)
uniP_des_weekendAM_11_14 <- unique(poly_des_weekendAM_11_14)
uniP_des_weekendPM_16_19 <- unique(poly_des_weekendPM_16_19)Sum total trips
We will then sum up the total number of trips made from a bus stop of origin to the destination using group_by() for sorting. Putting multiple arguments into the function allows for the sub categorising the data. This way we can track the drop off point of the passengers.
ori_des_weekdayAM_6_9 <- uniP_des_weekdayAM_6_9 %>%
rename(DESTIN_GRID = grid_id) %>%
drop_na() %>%
group_by(ORIGIN_GRID, DESTIN_GRID) %>%
summarise(PEAK = sum(TRIPS))
ori_des_weekdayPM_5_8 <- uniP_des_weekdayPM_5_8 %>%
rename(DESTIN_GRID = grid_id) %>%
drop_na() %>%
group_by(ORIGIN_GRID, DESTIN_GRID) %>%
summarise(PEAK = sum(TRIPS))
ori_des_weekendAM_11_14 <- uniP_des_weekendAM_11_14 %>%
rename(DESTIN_GRID = grid_id) %>%
drop_na() %>%
group_by(ORIGIN_GRID, DESTIN_GRID) %>%
summarise(PEAK = sum(TRIPS))
ori_des_weekendPM_16_19 <- uniP_des_weekendPM_16_19 %>%
rename(DESTIN_GRID = grid_id) %>%
drop_na() %>%
group_by(ORIGIN_GRID, DESTIN_GRID) %>%
summarise(PEAK = sum(TRIPS))Visualisation
We will first remove any intra-hexagonal travel.
R_weekdayAM_6_9 <- ori_des_weekdayAM_6_9[ori_des_weekdayAM_6_9$ORIGIN_GRID!=ori_des_weekdayAM_6_9$DESTIN_GRID,]
R_weekdayPM_5_8 <- ori_des_weekdayPM_5_8[ori_des_weekdayPM_5_8$ORIGIN_GRID!=ori_des_weekdayPM_5_8$DESTIN_GRID,]
R_weekendAM_11_14 <- ori_des_weekendAM_11_14[ori_des_weekendAM_11_14$ORIGIN_GRID!=ori_des_weekendAM_11_14$DESTIN_GRID,]
R_weekendPM_16_19 <- ori_des_weekendPM_16_19[ori_des_weekendPM_16_19$ORIGIN_GRID!=ori_des_weekendPM_16_19$DESTIN_GRID,]Create desire lines
Next we will visualise all of the flow or connections between different bus stops from a subzone to another using the od2line() function from the stplanr package.
flow_weekdayAM_6_9 <- od2line(flow = R_weekdayAM_6_9,
zones = honeycomb_grid_sf,
zone_code = "grid_id")
flow_weekdayPM_5_8 <- od2line(flow = R_weekdayPM_5_8,
zones = honeycomb_grid_sf,
zone_code = "grid_id")
flow_weekendAM_11_14 <- od2line(flow = R_weekendAM_11_14,
zones = honeycomb_grid_sf,
zone_code = "grid_id")
flow_weekendPM_16_19 <- od2line(flow = R_weekendPM_16_19,
zones = honeycomb_grid_sf,
zone_code = "grid_id")Visualise desire lines
We can then visualise this flow using the following code. Since the passenger flow is high, it is better to limit the visualisation. Here we will use a flow passenger flow from hexagons to hexagons of 1500 or more.
tmap_mode("view")
tm_shape(grid_weekdayAM) +
tm_polygons() +
flow_weekdayAM_6_9 %>%
filter(PEAK >= 1500) %>%
tm_shape() +
tm_lines(lwd = "PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3)tmap_mode("view")
tm_shape(grid_weekdayPM) +
tm_polygons() +
flow_weekdayPM_5_8 %>%
filter(PEAK >= 1500) %>%
tm_shape() +
tm_lines(lwd = "PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3)tmap_mode("view")
tm_shape(grid_weekendAM) +
tm_polygons() +
flow_weekendAM_11_14 %>%
filter(PEAK >= 1500) %>%
tm_shape() +
tm_lines(lwd = "PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3)tmap_mode("view")
tm_shape(grid_weekendPM) +
tm_polygons() +
flow_weekendPM_16_19 %>%
filter(PEAK >= 1500) %>%
tm_shape() +
tm_lines(lwd = "PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3)